Malaria is is one of oldest and deadliest diseases and has killed millions of people across history. A staggering 150 and 300 million lives in the 20th century [1], and 619,000 lives in 2021 [2], were taken by the disease. Modern day machine learning techniques, such as neural networks, can be used to diagnose the presence of infection in cells. Increasing diagnosis speed and accuracy this way will minimize the overdiagnosis of malaria, and allow for more attention to patients and families who are affected by the disease. This will reduce economic and social burden alongside increasing the overall effectiveness of doctors, especially in underserved areas where resources are limited.
The best model, a convolutional neural network used alongside data augmentation and regularization techniques, performs with 98.5% accuracy. The recall is 99% for parasitized instances, which means the model has minimized false negatives, which is ideal for the healthcare diagnosis context -- false negatives in malaria diagnosis are the worst case scenario. With just 542,514 parameters, the model features 3 convolutional layers for feature extraction, 2 MaxPooling layers, 3 dense layers, and a dropout layer located after the first dense layer in the model. The leaky ReLU activation function was used, with the exception of the first convolutional layer which used the hyperbolic tangent function. This means easier deployment, and on-the-ground training and model updates, in areas with fewer computing resources and more technological constraints.
Due to computing resource constraints, some larger models were not tested and experimented on in depth. Another pretrained model, besides the VGG16 model which was tested, might yield higher accuracy. Additionally, altering the architecture of the proposed model with different activation functions, and adding or removing feature extraction layers, may help boost performance. It may also be a matter of altering the learning rate parameter.
Furthermore, additional data augmentation strategies like gaussian blurring may be adopted as part of a slightly different pipeline. For example, the Albumentations module has more potential image transformations than the native tensorflow ImageDataGenerator module [3].
Stakeholders should pursue additional testing and training with this model, with more data from diverse sources, and boost accuracy even further. Tweaks should be tested on the model architecture. Additionally, this model should be tested in clinical scenarios with physicians verifying results. Stakeholders should be cautious in deploying and using this model and should ensure that a physician double checks results, especially in cases where the model is less confident in its classification. Stakeholders may also seek to use this model on personal devices, such as laptops and cellphones, to mitigate problems such as stolen equipment and unreliable electricity and power.
Malaria is a large-scale health problem. As one of the oldest and deadliest diseases, it has killed millions of people across history. Claiming between 150 and 300 million lives in the 20th century [1], and 619,000 in 2021 [2], it comprises a significant share of humanity's death and disease burden. It may also be represented as a data science problem -- we are classifying diseased and uninfected instances on he basis of features present in our patients and images of cells. More specifically, the data science problem we are solving here is the issue of malaria classification based on visual features in images; a problem that convolutional neural networks (CNNs) are well suited for. Hence, a convolutional neural network was built here.
Several different models were tested, ranging from relatively simple to complex, with varying model architectures and data augmentation strategies. Broadly speaking, the models tested were an initial base model, a second model with more layers, a third model with batch normalization, a fourth model with data augmentation strategies and a dropout layer, and a final fourth model with the VGG16 pre-trained model. Details may be found in the full-code below. Some model variations and architectures were not tested due to computing and cost restraints, and should be tested more in depth as suggested in the recommendations section.
The third model was picked as the best model, and offers a 98.5% accuracy rate, with 99% recall for parasitized instances, meaning a high accuracy solution with a minimization of false negatives. This minimization of false negatives is highly favorable within the context of a health problem like malaria diagnosis -- the worst case scenario is to wrongfully determine an infected person as healthy. The F1 score for parasitized and uninfected instances is 99% and 98%, respectively. This means that overall, both false negatives and false positives are rare.
The constructed model features 3 convolutional layers for feature extraction, 2 MaxPooling layers, 3 dense layers, and a dropout layer located after the first dense layer in the model for regularization purposes. Additionally,the leaky ReLU activation function was used, with the exceptions of the first convolutional layer which used the hyperbolic tangent function, and the last dense layer which used a softmax function. This amounts to a relatively lightweight and fast solution with 542,514 parameters.
Practically speaking, this simplicity means that the model will take up less space on hard drives and run faster. The utility of this is amplified when we consider that this model will in all likelihood be used in developing countries, where malaria is not only more prevalent, but clinical workers are working with more technological and computing constraints, as well as potentially sparse internet access. Not only will this model be more accommodating to older and slower hardware, it will be easier to update on the ground. Data scientists, or perhaps clinicians with data science skills, will be able to more easily retrain and fine-tune a model like this one, with only 540 thousand parameters. Compare this to certain pre-trained models, like the VGG16 model tested here, which utilized approximately 15 million parameters. Not only is the proposed model smaller and faster than the VGG16 model, it is more accurate, with the VGG16 model accuracy at 96.2% versus the proposed model's 98.5%.
In terms of malaria diagnosis within the context of on-the-ground time and computing constraints, the proposed model will save physicians significant time and protect heavily against misdiagnosis, and perhaps inform physicians on which diagnoses should be revisited, perhaps with a more thorough procedure. This model can help save time, money, and human lives if applied by physicians dealing with malaria patients.
This model should be used as a tool to enhance physicians' malaria diagnosis capabilities; it not meant to be a replacement for clinicians. Model diagnoses should be used to double check physician conclusions and vice versa. Cases where the model and physician disagree should be investigated more thoroughly, perhaps with more thorough diagnosis methodologies, with the physician making the final call as to a patient's status as infected or uninfected.
Stakeholders should see to it that the model is tested in real-world scenarios and validated by physicians. The model's lightweight nature allows for it to be easily shared with doctors and ran on a wide variety of machines that doctors may be using, in a plethora of contexts and conditions under which malaria diagnosis is required. As such it should be distributed to hospitals and clinics for their own use and testing, with their own unique data. Feedback should be sought out from those using this model, and the model should be iterated on accordingly. More data from a variety of sources should also be used to update this model's training. This model may also be used as a base model for further training -- the model's architecture and its parameters might be updated on-the-ground based on new data, which is again made easier by the model's quick and lightweight nature. An additional recommendation would be to adapt this model to run on a smartphone, such as an iPhone or Android.
As stated prior, this model's lightweight nature will allow it to run more easily in conditions where internet access, computing and technical resources, and human capital are limited. Its simplicity is its strength; it is faster and easier to run, takes up less space on hard drives, and is highly adaptable thanks to faster model re-training should on-the-ground clinicians see fit to do so.
The model's ability to make malaria diagnosis more reliable and accurate will have significant economic and quality-of-life effects in places where malaria is prevalent. In 2021, the WHO found that what they define as the African region carries 95% of malaria cases, and 96% of malaria deaths. Suprisingly, malaria is actually overdiagnosed [4]. This means that clinical material, doctors' time, and transportation costs are all wasted when malaria is diagnosed as a false positive -- all resources that can be used on those who are truly infected with malaria. In 2019, Haakenstad et al. found that global spending on Malaria from in both government and out-of-pocket expenditure amounted to $4.3 billion USD [5].
At present, it is not possible to determine how much of this money is overspent thanks to overdiagnosis of malaria. However, data from North-Eastern Tanzania, collected by Mosha et al., suggests a conservative 15 to 45% range for misdiagnosis [4]. Extrapolating this range globally suggests that the proposed model may save an estimated 0.65 to 1.9 billion USD annually. Distribution of this model will cost virtually nothing, as most clinics and hospitals will have the on-site resources to run this model, and doctors and lab technicians will easily be able to run the model with minimal instruction. There will be no need for cloud computing expenses and minimal need for hardware distribution in the cases where the model cannot be downloaded over the internet; all that would be needed in such a case is a small flash drive distributed to the hospital. As an overinflated estimate, we may assume at maximum such costs will amount to 100 thousand dollars.
Key risks and challenges include the possibility of technical difficulties on-the-ground with the model, such as data formatting issues. A weakness of this model is that it has been trained on one dataset with relatively high quality images, which is why it has been recommended to train the model on more data from a variety of sources. However, this weakness has been mitigated significantly through the data augmentation and regularization practices mentioned previously. Another challenge will be running this model in placed where electricity and power is unreliable -- for these situations, battery powered devices such as laptops and phones are recommended. Running the model on personal devices like these will also mitigate losses from stolen equipment, which is an unfortunate reality in some areas of developing countries. The model is strong overall, but it will need more exposure to real-world data to see how it performs in clinical contexts.
As mentioned prior, further analysis should include more training with more data, as well as double-checking results by physicians. Additionally, the model should be exposed to different strains of malaria in different stages. To further maximize the effectiveness of any particular dataset, the Albumentations data augmentation module may also be used, as it has significantly more potential image transformations than tensorflow's native ImageDataGenerator module [3]. For example, gaussian blurring may be used to augment image data. Different pre-trained models, besides VGG16, may also be tested. It may also be possible to significantly augment model performance with patient-specific data, although this proves to be a significantly more complex task and ventures deeper into the growing field of precision medicine. More general changes may be made to model architecture as well, by way of modifying things like activation functions and learning rates.
Overall, this model will prove useful to physicians on-the-ground, and will benefit the lives of millions, and allow governments, charity organizations, and research groups to more effectively allocate their resources by virtue of cutting down on waste expenses.
Problem Definition
The context: Why is this problem important to solve?
The objectives: What is the intended goal?
The key questions: What are the key questions that need to be answered?
The problem formulation: What is it that we are trying to solve using data science?
There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:
Parasitized: The parasitized cells contain the Plasmodium parasite which causes malaria
Uninfected: The uninfected cells are free of the Plasmodium parasites
The dataset appears to come from a single source. The images are .png files, and are each approximately 6-16KB in size, with most hovering around 12KB. Pixel density is at 72ppi, and most images range around 120x120 pixels^2, although this does vary, and images are not necessarily square. These are images of parasitized and uninfected individual cells.
Mount the Drive
from google.colab import drive
path = '/content/drive'
drive.mount(path)
Mounted at /content/drive
# for path and file-handling
import os
import zipfile
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# working with tensors and images
import tensorflow as tf
from PIL import Image
Note:
path = '/content/drive/MyDrive/MIT ADSP Capstone/cell_images.zip'
with zipfile.ZipFile(path, 'r') as zip_data:
zip_data.extractall()
folder_path = '/content/cell_images'
The extracted folder has different folders for train and test data will contain the different sizes of images for parasitized and uninfected cells within the respective folder name.
The size of all images must be the same and should be converted to 4D arrays so that they can be used as an input for the convolutional neural network. Also, we need to create the labels for both types of images to be able to train and test the model.
Let's do the same for the training data first and then we will use the same code for the test data as well.
from tensorflow.keras.utils import image_dataset_from_directory # for encoding and grabbing data from directory
# Define a constant image size:
IMG_SIZE = 64
# Create and use dataloaders to create sets for data analysis and exploration
# REF: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
batch_size = 128
train_set = image_dataset_from_directory(folder_path + '/train',
subset='training',
validation_split=0.2,
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size
)
val_set = image_dataset_from_directory(folder_path + '/train',
subset='validation',
validation_split=0.2,
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size
)
test_set = image_dataset_from_directory(folder_path + '/test',
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size
)
Found 24958 files belonging to 2 classes. Using 19967 files for training. Found 24958 files belonging to 2 classes. Using 4991 files for validation. Found 2600 files belonging to 2 classes.
# Check the class names:
train_set.class_names
['parasitized', 'uninfected']
From here, we can tell that 'parasitized' and 'uninfected' correspond to 0 and 1 labels, respectively.
for image_batch, labels_batch in train_set:
print("Shape of training set images:", image_batch.shape)
break
for image_batch, labels_batch in test_set:
print("Shape of test set images:", image_batch.shape)
break
Shape of training set images: (128, 64, 64, 3) Shape of test set images: (128, 64, 64, 3)
'''We can see there are 128 images in a batch, each (64 x 64 x 3).''' 64 x 64 pixels, with 3 RGB channels.
Check the shape of train and test labels
for image_batch, labels_batch in train_set:
print("Shape of training set labels:", labels_batch.shape)
break
for image_batch, labels_batch in test_set:
print("Shape of test set labels:", labels_batch.shape)
break
Shape of training set labels: (128,) Shape of test set labels: (128,)
Observations and insights: 128 labels in a batch.
# Unbatch all sets of images, then convert to numpy, then to dataframe
train_set = train_set.unbatch()
val_set = val_set.unbatch()
test_set = test_set.unbatch()
# use as_numpy_iterator(), a method from tensorflow's dataset object to convert items into numpy arrays
train_set = list(train_set.as_numpy_iterator())
val_set = list(val_set.as_numpy_iterator())
test_set = list(test_set.as_numpy_iterator())
train_set = np.array(train_set, dtype=object)
val_set = np.array(val_set, dtype=object)
test_set = np.array(test_set, dtype=object)
# for ease of exploratory data analysis, we will use dataframes
train_df = pd.DataFrame(train_set)
val_df = pd.DataFrame(val_set)
test_df = pd.DataFrame(test_set)
# images are represented by pixel values ranging from 0-255, so set minimum at top of range, and maximum at bottom of range
# then replace minima and maxima according to what we find in the dataset.
min = 255
max = 0
for img in train_df[0]:
temp_min = np.min(img)
temp_max = np.max(img)
if temp_min < min:
min = temp_min
if temp_max > max:
max = temp_max
print("Min, Max for training dataset:", min, max)
min = 255
max = 0
for img in val_df[0]:
temp_min = np.min(img)
temp_max = np.max(img)
if temp_min < min:
min = temp_min
if temp_max > max:
max = temp_max
print("Min, Max for val dataset:", min, max)
min = 255
max = 0
for img in test_df[0]:
temp_min = np.min(img)
temp_max = np.max(img)
if temp_min < min:
min = temp_min
if temp_max > max:
max = temp_max
print("Min, Max for test dataset:", min, max)
Min, Max for training dataset: 0.0 255.0 Min, Max for val dataset: 0.0 255.0 Min, Max for test dataset: 0.0 255.0
We may group training and validation dataset for the purpose of the question.
The training dataset (min, max) is (0.0, 255.0).
The test dataset (min, max) is (0.0, 255.0)
Observations and insights: The maximum pixel value is 255, and the minimum pixel value 0. This is true across both training and validation sets.
Note:
# Where 0 is parasitized, and 1 is uninfected.
# sort_index() so we can see the values according to label clearly, instead of ranking by frequency.
train_df[1].value_counts().sort_index()
0 10102 1 9865 Name: 1, dtype: int64
val_df[1].value_counts().sort_index()
0 2480 1 2511 Name: 1, dtype: int64
test_df[1].value_counts().sort_index()
0 1300 1 1300 Name: 1, dtype: int64
count_check = len(train_df) + len(val_df) + len(test_df)
# if we have done everything right, this will evaluate to 27558 and match
# the sum of parasitized and infected across all datasets, and images in the folders
# as given by the problem statement (i.e. 24958 + 2600)
if count_check == (24958 + 2600):
print("The number of images in all sets matches the number given by the problem statement.")
The number of images in all sets matches the number given by the problem statement.
In the training dataset: 10102 parasitized, 9865 infected.
In the validation dataset: 2480 parasitized, 2511 infected.
(Test + Validation dataset: 12582 parasitized, 12376 infected).
In the test dataset: 1300 parasitized, 1300 infected.
# train_df[0] represents a series of all images
# train_df[1] represents a series of all labels, corresponding to those images.
# same pattern repeated for val_df and test_df
train_df[0] = train_df[0]/255.0
val_df[0] = val_df[0]/255.0
test_df[0] = test_df[0]/255.0
Observations and insights: Data has been normalized by dividing by 255.0.
# put all labels in series variables for ease of use and clarity
s1 = train_df[1]
s2 = val_df[1]
s3 = test_df[1]
#get a count of uninfected and parasitized labels across all the data
label_0 = s1.value_counts()[0] + s2.value_counts()[0] + s3.value_counts()[0]
label_1 = s1.value_counts()[1] + s2.value_counts()[1] + s3.value_counts()[1]
# Create a dataframe with the labels and their counts
d = {'Uninfected' : label_1, 'Parasitized' : label_0}
labels_df = pd.DataFrame(d, index=['Label Count'])
labels_df
| Uninfected | Parasitized | |
|---|---|---|
| Label Count | 13676 | 13882 |
# plot
sns.barplot(labels_df)
plt.title('Classification Counts');
Observations and insights: The data is balanced. There are 13676 'uninfected' instances, and 13882 'parasitized' instances, so slightly more parasitized instances.
Let's visualize the images from the train data
# For the sake of labeling and seeing on plots, create a list of the labels as strings
train_labels = pd.Series.map(train_df[1], {0: 'Parasitized', 1: 'Uninfected'})
val_labels = pd.Series.map(val_df[1], {0: 'Parasitized', 1: 'Uninfected'})
test_labels = pd.Series.map(test_df[1], {0: 'Parasitized', 1: 'Uninfected'})
Observations and insights: We have now created string label lists for each dataset.
Please note: due to the constraints of the instructions, labels above the bottom row are cut off. Another plot has been included below the first, adjusted for the labels.
plt.figure(figsize=(12, 12))
for i in range(36):
plt.subplot(6, 6, i+1)
plt.imshow(train_df[0][i])
plt.xlabel(train_labels[i])
plt.show()
plt.figure(figsize=(13, 15)) # adjusted for the labels to show in the plot
for i in range(36):
plt.subplot(6, 6, i+1)
plt.imshow(train_df[0][i])
plt.xlabel(train_labels[i])
plt.show()
Observations and insights: Parasitized images have distinct discoloration and features. Uninfected images look more featureless, more "monotone" and even-colored. Parasitized images also generally have more edges, and are more misshapen, where the uninfected images are generally rounder and smoother in terms of edges.
def plot_mean_img(img_arr1, img_arr2, img_arr3, label = "Label"):
'''A function for concatenating and finding the mean image over 3 image arrays.
Three arguments, img_arr1 to image_arr3, and an optional "label" parameter to show
which set is being plotted.'''
imgs = pd.concat([img_arr1, img_arr2, img_arr3]) # concatenate the image arrays
m = np.mean(imgs) # generate the mean image
# plot
plt.imshow(m)
plt.xlabel(label)
# separate out uninfected and parasitized instances, take subsets of dataframe based on condition of label value
train_df_uninfected = train_df.loc[train_df[1] == 1][0]
train_df_parasitized = train_df.loc[train_df[1] == 0][0]
val_df_uninfected = val_df.loc[val_df[1] == 1][0]
val_df_parasitized = val_df.loc[val_df[1] == 0][0]
test_df_uninfected = test_df.loc[test_df[1] == 1][0]
test_df_parasitized = test_df.loc[test_df[1] == 0][0]
Mean image for parasitized
plot_mean_img(train_df_parasitized, val_df_parasitized, test_df_parasitized, 'Parasitized')
Mean image for uninfected
plot_mean_img(train_df_uninfected, val_df_uninfected, test_df_uninfected, 'Uninfected')
Observations and insights: The average uninfected and parasitized images do not look very different to the untrained human eye, although the parasitized average does look slightly redder in color.
Converting the train data
import cv2
# REF: https://www.tutorialspoint.com/how-to-convert-an-rgb-image-to-hsv-image-using-opencv-python
# note that training data was initially split into test and validation set
# by tensorflow's image dataloader function
train_hsv_list = [] # an array for the hsv images
for img in train_df[0]: # loop through pandas series image by image and convert
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
train_hsv_list.append(hsv_img)
val_hsv_list = []
for img in val_df[0]:
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
val_hsv_list.append(hsv_img)
#print(len(train_hsv_list), len(val_hsv_list)) # to verify that no image was missed, should be 19967 and 4991
plt.figure(figsize=(8, 9)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(train_hsv_list[i])
plt.xlabel(train_labels[i])
plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Converting the test data
test_hsv_list = []
for img in test_df[0]:
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
test_hsv_list.append(hsv_img)
# print(len(test_hsv_list)) # to verify that no image was missed, should be 2600
plt.figure(figsize=(8, 9)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(test_hsv_list[i])
plt.xlabel(test_labels[i])
plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Observations and insights: HSV images have more aggressive color change, notable in the parasitized cells especially. Visual markers for parasitic infection are easier to see (greater contrast between discolored feature and pink/magenta background or average coloration of the cell).
Gaussian Blurring on train data
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
sigmaX = np.random.randint(0, 11) # randomly pick a sigmaX between 0 and 10
plt.imshow(cv2.GaussianBlur(src=train_df[0][i], ksize=(3 ,3), sigmaX=sigmaX))
plt.xlabel(train_labels[i] + ", sigmaX = " + str(sigmaX))
plt.show()
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
k = np.random.randint(0, 2) # randomly pick a ksize
if k == 0:
ksize = (3, 3)
else:
ksize = (5, 5)
plt.imshow(cv2.GaussianBlur(src=train_df[0][i], ksize=ksize, sigmaX=0))
plt.xlabel(train_labels[i] + ", ksize = " + str(ksize))
plt.show()
Gaussian Blurring on test data
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
sigmaX = np.random.randint(0, 11)
plt.imshow(cv2.GaussianBlur(src=test_df[0][i], ksize=(3 ,3), sigmaX=sigmaX))
plt.xlabel(test_labels[i] + ", sigmaX = " + str(sigmaX))
plt.show()
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
plt.subplot(3, 3, i+1)
k = np.random.randint(0, 2)
if k == 0:
ksize = (3, 3)
else:
ksize = (5, 5)
plt.imshow(cv2.GaussianBlur(src=test_df[0][i], ksize=ksize, sigmaX=0))
plt.xlabel(test_labels[i] + ", ksize = " + str(ksize))
plt.show()
Observations and insights: Gaussian blurring makes the image blurrier. A higher sigmaX value appears to make the image slightly blurrier, but it is difficult to tell with human eyes. The sigmaX value is responsible for defining the standard deviation across the X axis, and when sigmaY is left undefined it is taken as the same as sigmaX.
A higher kernel size is associated with more blurring/smoothing, where a higher variance (sigmaX) is also associated with the more blur due to greater emphasis on pixels farther from the center in the kernel window.
REFS:
https://hackaday.com/2021/07/21/what-exactly-is-a-gaussian-blur/
Think About It: Would blurring help us for this problem statement in any way? What else can we try?
Yes, blurring will help with this problem because it will help control for lower resolution images that may appear in real-world applications. We may also try things like image rotations, truncation, distortions, discoloration, and any number of other data augmentation techniques for images.
Importing the required libraries for building and training our Model
from tensorflow.keras.models import Sequential, Model # sequential api for sequential model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Rescaling
from tensorflow.keras.layers import BatchNormalization, Activation, Input, LeakyReLU
from tensorflow.keras import backend, losses, optimizers
from tensorflow.keras.optimizers import RMSprop, Adam, SGD # optimizers for modeling
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
# Clearing the backend and setting random number seeds.
# this is done to clear up the backend from previous runs as the model is modified.
from tensorflow.keras import backend
import random
def reset_session(random_seed=117):
'''A function to reset the session. Sets numpy, random, and tensorflows random seeds to the given parameter, or 117 by default.'''
np.random.seed(random_seed)
random.seed(random_seed)
tf.random.set_seed(random_seed)
backend.clear_session()
reset_session()
#from pandas.core.arrays.timedeltas import validate_endpoints
# reimport the data for ease in formatting and optimization
# Create and use dataloaders to create sets and pass to neural networks
# REF: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
batch_size = 128
IMG_SIZE = 64
train_ds = image_dataset_from_directory(folder_path + '/train',
validation_split=0.2,
subset='training',
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size,
label_mode='int'
)
val_ds = image_dataset_from_directory(folder_path + '/train',
validation_split=0.2,
subset='validation',
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size,
label_mode='int'
)
test_ds = image_dataset_from_directory(folder_path + '/test',
seed=117,
image_size=(IMG_SIZE, IMG_SIZE),
batch_size=batch_size,
label_mode='int'
)
Found 24958 files belonging to 2 classes. Using 19967 files for training. Found 24958 files belonging to 2 classes. Using 4991 files for validation. Found 2600 files belonging to 2 classes.
# get all images and labels from the datasets in arrays
# training
train_ds_unbatched = train_ds.unbatch()
X_train = []
y_train = []
for image_batch, labels_batch in train_ds_unbatched:
X_train.append(image_batch.numpy())
y_train.append(labels_batch.numpy())
X_train = np.asarray(X_train).astype('float32')
y_train = np.asarray(y_train).astype('float32')
# validation
val_ds_unbatched = val_ds.unbatch()
X_val = []
y_val = []
for image_batch, labels_batch in val_ds_unbatched:
X_val.append(image_batch.numpy())
y_val.append(labels_batch.numpy())
X_val = np.asarray(X_val).astype('float32')
y_val = np.asarray(y_val).astype('float32')
# test
test_ds_unbatched = test_ds.unbatch()
X_test = []
y_test = []
for image_batch, labels_batch in test_ds_unbatched:
X_test.append(image_batch.numpy())
y_test.append(labels_batch.numpy())
X_test = np.asarray(X_test).astype('float32')
y_test = np.asarray(y_test).astype('float32')
# use tf's AUTOTUNE function for performance optimization
# REF: https://www.tensorflow.org/guide/data_performance
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(500).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
One Hot Encoding the train and test labels
# encoding is taken care of already in the loading of the data,
# in cell 39, using the image_dataset_from_directory method in tensorflow's library
# and again
# however, we could have one hot encoded with tensorflow's to_categorical function.
Building the model
#initialize a sequential model
model0 = Sequential([
Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
Conv2D(16, 3, padding='same', activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
Compiling the model
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model0.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model0.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 64, 64, 3) 0
conv2d (Conv2D) (None, 64, 64, 16) 448
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
max_pooling2d_1 (MaxPooling (None, 16, 16, 32) 0
2D)
flatten (Flatten) (None, 8192) 0
dense (Dense) (None, 128) 1048704
dense_1 (Dense) (None, 64) 8256
dense_2 (Dense) (None, 2) 130
=================================================================
Total params: 1,062,178
Trainable params: 1,062,178
Non-trainable params: 0
_________________________________________________________________
Using Callbacks
# create model checkpoint, filepath
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-{epoch:02d}-{val_accuracy:.4f}.hdf5'
callbacks = [
EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]
Fit and train our Model
epochs = 15
history_0 = model0.fit(
train_ds,
validation_data=val_ds,
epochs=epochs,
callbacks=callbacks
)
Epoch 1/15 147/156 [===========================>..] - ETA: 0s - loss: 0.6311 - accuracy: 0.6393 Epoch 1: val_accuracy improved from -inf to 0.71328, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-01-0.7133.hdf5 156/156 [==============================] - 20s 22ms/step - loss: 0.6293 - accuracy: 0.6407 - val_loss: 0.5557 - val_accuracy: 0.7133 Epoch 2/15 149/156 [===========================>..] - ETA: 0s - loss: 0.4965 - accuracy: 0.7591 Epoch 2: val_accuracy improved from 0.71328 to 0.85414, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-02-0.8541.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.4912 - accuracy: 0.7624 - val_loss: 0.3710 - val_accuracy: 0.8541 Epoch 3/15 151/156 [============================>.] - ETA: 0s - loss: 0.2357 - accuracy: 0.9082 Epoch 3: val_accuracy improved from 0.85414 to 0.94109, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-03-0.9411.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.2340 - accuracy: 0.9089 - val_loss: 0.1523 - val_accuracy: 0.9411 Epoch 4/15 148/156 [===========================>..] - ETA: 0s - loss: 0.1414 - accuracy: 0.9494 Epoch 4: val_accuracy improved from 0.94109 to 0.95652, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-04-0.9565.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.1407 - accuracy: 0.9495 - val_loss: 0.1171 - val_accuracy: 0.9565 Epoch 5/15 151/156 [============================>.] - ETA: 0s - loss: 0.1015 - accuracy: 0.9655 Epoch 5: val_accuracy did not improve from 0.95652 156/156 [==============================] - 1s 6ms/step - loss: 0.1020 - accuracy: 0.9653 - val_loss: 0.1412 - val_accuracy: 0.9503 Epoch 6/15 150/156 [===========================>..] - ETA: 0s - loss: 0.0782 - accuracy: 0.9747 Epoch 6: val_accuracy improved from 0.95652 to 0.96233, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-06-0.9623.hdf5 156/156 [==============================] - 2s 13ms/step - loss: 0.0780 - accuracy: 0.9747 - val_loss: 0.1015 - val_accuracy: 0.9623 Epoch 7/15 149/156 [===========================>..] - ETA: 0s - loss: 0.0651 - accuracy: 0.9789 Epoch 7: val_accuracy improved from 0.96233 to 0.96714, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-07-0.9671.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.0647 - accuracy: 0.9788 - val_loss: 0.0994 - val_accuracy: 0.9671 Epoch 8/15 151/156 [============================>.] - ETA: 0s - loss: 0.0485 - accuracy: 0.9840 Epoch 8: val_accuracy did not improve from 0.96714 156/156 [==============================] - 1s 6ms/step - loss: 0.0488 - accuracy: 0.9840 - val_loss: 0.1090 - val_accuracy: 0.9653 Epoch 9/15 151/156 [============================>.] - ETA: 0s - loss: 0.0368 - accuracy: 0.9884 Epoch 9: val_accuracy did not improve from 0.96714 156/156 [==============================] - 1s 6ms/step - loss: 0.0367 - accuracy: 0.9884 - val_loss: 0.1025 - val_accuracy: 0.9641 Epoch 10/15 149/156 [===========================>..] - ETA: 0s - loss: 0.0298 - accuracy: 0.9906 Epoch 10: val_accuracy did not improve from 0.96714 156/156 [==============================] - 1s 7ms/step - loss: 0.0293 - accuracy: 0.9907 - val_loss: 0.1093 - val_accuracy: 0.9657 Epoch 11/15 148/156 [===========================>..] - ETA: 0s - loss: 0.0248 - accuracy: 0.9919 Epoch 11: val_accuracy did not improve from 0.96714 156/156 [==============================] - 1s 6ms/step - loss: 0.0243 - accuracy: 0.9921 - val_loss: 0.1162 - val_accuracy: 0.9661 Epoch 12/15 148/156 [===========================>..] - ETA: 0s - loss: 0.0165 - accuracy: 0.9948 Epoch 12: val_accuracy did not improve from 0.96714 156/156 [==============================] - 1s 6ms/step - loss: 0.0172 - accuracy: 0.9945 - val_loss: 0.1218 - val_accuracy: 0.9605 Epoch 12: early stopping
Evaluating the model on test data
loss0, accuracy0 = model0.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy0)
21/21 - 1s - loss: 0.0970 - accuracy: 0.9665 - 753ms/epoch - 36ms/step 0.9665384888648987
Plotting the confusion matrix
y_predictions = model0.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 193ms/epoch - 9ms/step
conf_matrix_model0 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model0, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
Plotting the train and validation curves
plt.plot(history_0.history['accuracy'])
plt.plot(history_0.history['val_accuracy'])
plt.legend(['Train', 'Validation'])
plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
It appears we have slight overfitting; regularization techniques may help with this, such as data augmentation and dropout layers.
So now let's try to build another model with few more add on layers and try to check if we can try to improve the model. Therefore try to build a model by adding few layers if required and altering the activation functions.
Trying to improve the performance of our model by adding new layers
Building the Model
# first, reset the backend:
reset_session()
model1 = Sequential([
Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
Conv2D(16, 3, padding='same', activation='tanh'),
MaxPooling2D((2, 2)),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(16, 3, padding='same', activation='relu'),
Flatten(), # flatten before passing data onto fully connected layers
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
In this model, we have changed:
Compiling the model
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model1.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 64, 64, 3) 0
conv2d (Conv2D) (None, 64, 64, 16) 448
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
max_pooling2d_1 (MaxPooling (None, 16, 16, 32) 0
2D)
conv2d_2 (Conv2D) (None, 16, 16, 16) 4624
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 128) 524416
dense_1 (Dense) (None, 64) 8256
dense_2 (Dense) (None, 2) 130
=================================================================
Total params: 542,514
Trainable params: 542,514
Non-trainable params: 0
_________________________________________________________________
Using Callbacks
# create model checkpoint filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5'
callbacks = [
EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]
Fit and Train the model
epochs = 15
history_1 = model1.fit(
train_ds,
validation_data=val_ds,
epochs=epochs,
callbacks=callbacks
)
Epoch 1/15 153/156 [============================>.] - ETA: 0s - loss: 0.6239 - accuracy: 0.6494 Epoch 1: val_accuracy improved from -inf to 0.79223, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 4s 16ms/step - loss: 0.6222 - accuracy: 0.6516 - val_loss: 0.4970 - val_accuracy: 0.7922 Epoch 2/15 156/156 [==============================] - ETA: 0s - loss: 0.3015 - accuracy: 0.8786 Epoch 2: val_accuracy improved from 0.79223 to 0.92527, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.3015 - accuracy: 0.8786 - val_loss: 0.1899 - val_accuracy: 0.9253 Epoch 3/15 156/156 [==============================] - ETA: 0s - loss: 0.1416 - accuracy: 0.9484 Epoch 3: val_accuracy improved from 0.92527 to 0.95712, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.1416 - accuracy: 0.9484 - val_loss: 0.1267 - val_accuracy: 0.9571 Epoch 4/15 153/156 [============================>.] - ETA: 0s - loss: 0.0947 - accuracy: 0.9677 Epoch 4: val_accuracy improved from 0.95712 to 0.96874, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.0952 - accuracy: 0.9675 - val_loss: 0.0966 - val_accuracy: 0.9687 Epoch 5/15 151/156 [============================>.] - ETA: 0s - loss: 0.0686 - accuracy: 0.9760 Epoch 5: val_accuracy improved from 0.96874 to 0.97536, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.0680 - accuracy: 0.9763 - val_loss: 0.0846 - val_accuracy: 0.9754 Epoch 6/15 149/156 [===========================>..] - ETA: 0s - loss: 0.0626 - accuracy: 0.9792 Epoch 6: val_accuracy improved from 0.97536 to 0.97636, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.0621 - accuracy: 0.9794 - val_loss: 0.0851 - val_accuracy: 0.9764 Epoch 7/15 148/156 [===========================>..] - ETA: 0s - loss: 0.0460 - accuracy: 0.9845 Epoch 7: val_accuracy improved from 0.97636 to 0.97716, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.0457 - accuracy: 0.9845 - val_loss: 0.0793 - val_accuracy: 0.9772 Epoch 8/15 148/156 [===========================>..] - ETA: 0s - loss: 0.0380 - accuracy: 0.9871 Epoch 8: val_accuracy did not improve from 0.97716 156/156 [==============================] - 1s 7ms/step - loss: 0.0380 - accuracy: 0.9872 - val_loss: 0.0868 - val_accuracy: 0.9742 Epoch 9/15 154/156 [============================>.] - ETA: 0s - loss: 0.0304 - accuracy: 0.9897 Epoch 9: val_accuracy did not improve from 0.97716 156/156 [==============================] - 1s 6ms/step - loss: 0.0303 - accuracy: 0.9897 - val_loss: 0.0931 - val_accuracy: 0.9740 Epoch 10/15 148/156 [===========================>..] - ETA: 0s - loss: 0.0252 - accuracy: 0.9919 Epoch 10: val_accuracy did not improve from 0.97716 156/156 [==============================] - 1s 7ms/step - loss: 0.0250 - accuracy: 0.9920 - val_loss: 0.0895 - val_accuracy: 0.9750 Epoch 11/15 155/156 [============================>.] - ETA: 0s - loss: 0.0174 - accuracy: 0.9949 Epoch 11: val_accuracy did not improve from 0.97716 156/156 [==============================] - 1s 7ms/step - loss: 0.0173 - accuracy: 0.9949 - val_loss: 0.1214 - val_accuracy: 0.9734 Epoch 12/15 152/156 [============================>.] - ETA: 0s - loss: 0.0144 - accuracy: 0.9954 Epoch 12: val_accuracy did not improve from 0.97716 156/156 [==============================] - 1s 7ms/step - loss: 0.0143 - accuracy: 0.9953 - val_loss: 0.1029 - val_accuracy: 0.9742 Epoch 12: early stopping
Evaluating the model
loss1, accuracy1 = model1.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy1)
21/21 - 1s - loss: 0.0809 - accuracy: 0.9738 - 1s/epoch - 61ms/step 0.9738461375236511
Plotting the confusion matrix
y_predictions = model1.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
len(y_predictions) # will get 2600
conf_matrix_model1 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model1, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
21/21 - 0s - 154ms/epoch - 7ms/step
Plotting the train and the validation curves
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.legend(['Train', 'Validation'])
plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
Think about it: Now let's build a model with LeakyRelu as the activation function
Let us try to build a model using BatchNormalization and using LeakyRelu as our activation function.
# first, reset the backend:
reset_session()
Building the Model
model2 = Sequential([
Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
Conv2D(16, 3, padding='same'),
LeakyReLU(0.2),
MaxPooling2D((2, 2)),
Conv2D(32, 3, padding='same'),
LeakyReLU(0.2),
MaxPooling2D((2, 2)),
Conv2D(16, 3, padding='same'),
LeakyReLU(0.2),
Flatten(),
Dense(128),
LeakyReLU(0.2),
Dense(64),
LeakyReLU(0.2),
BatchNormalization(),
Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
Compiling the model
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model2.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model2.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 64, 64, 3) 0
conv2d (Conv2D) (None, 64, 64, 16) 448
leaky_re_lu (LeakyReLU) (None, 64, 64, 16) 0
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d_1 (MaxPooling (None, 16, 16, 32) 0
2D)
conv2d_2 (Conv2D) (None, 16, 16, 16) 4624
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 16) 0
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 128) 524416
leaky_re_lu_3 (LeakyReLU) (None, 128) 0
dense_1 (Dense) (None, 64) 8256
leaky_re_lu_4 (LeakyReLU) (None, 64) 0
batch_normalization (BatchN (None, 64) 256
ormalization)
dense_2 (Dense) (None, 2) 130
=================================================================
Total params: 542,770
Trainable params: 542,642
Non-trainable params: 128
_________________________________________________________________
Using callbacks
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5'
callbacks = [
EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]
Fit and train the model
epochs = 15
history_2 = model2.fit(
train_ds,
validation_data=val_ds,
epochs=epochs,
callbacks=callbacks
)
Epoch 1/15 152/156 [============================>.] - ETA: 0s - loss: 0.5385 - accuracy: 0.7345 Epoch 1: val_accuracy improved from -inf to 0.49689, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5 156/156 [==============================] - 5s 16ms/step - loss: 0.5369 - accuracy: 0.7358 - val_loss: 1.0926 - val_accuracy: 0.4969 Epoch 2/15 153/156 [============================>.] - ETA: 0s - loss: 0.3584 - accuracy: 0.8445 Epoch 2: val_accuracy improved from 0.49689 to 0.89641, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5 156/156 [==============================] - 1s 7ms/step - loss: 0.3570 - accuracy: 0.8449 - val_loss: 0.3857 - val_accuracy: 0.8964 Epoch 3/15 150/156 [===========================>..] - ETA: 0s - loss: 0.2630 - accuracy: 0.8955 Epoch 3: val_accuracy did not improve from 0.89641 156/156 [==============================] - 1s 7ms/step - loss: 0.2634 - accuracy: 0.8955 - val_loss: 7.0731 - val_accuracy: 0.4969 Epoch 4/15 150/156 [===========================>..] - ETA: 0s - loss: 0.2238 - accuracy: 0.9143 Epoch 4: val_accuracy did not improve from 0.89641 156/156 [==============================] - 1s 7ms/step - loss: 0.2222 - accuracy: 0.9146 - val_loss: 2.1897 - val_accuracy: 0.5923 Epoch 5/15 153/156 [============================>.] - ETA: 0s - loss: 0.1562 - accuracy: 0.9421 Epoch 5: val_accuracy did not improve from 0.89641 156/156 [==============================] - 1s 7ms/step - loss: 0.1558 - accuracy: 0.9422 - val_loss: 0.4959 - val_accuracy: 0.8567 Epoch 6/15 153/156 [============================>.] - ETA: 0s - loss: 0.1262 - accuracy: 0.9538 Epoch 6: val_accuracy did not improve from 0.89641 156/156 [==============================] - 1s 7ms/step - loss: 0.1273 - accuracy: 0.9534 - val_loss: 0.4944 - val_accuracy: 0.8431 Epoch 7/15 152/156 [============================>.] - ETA: 0s - loss: 0.1102 - accuracy: 0.9617 Epoch 7: val_accuracy did not improve from 0.89641 156/156 [==============================] - 1s 7ms/step - loss: 0.1105 - accuracy: 0.9614 - val_loss: 3.7523 - val_accuracy: 0.5063 Epoch 7: early stopping
Plotting the train and validation accuracy
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])
plt.legend(['Train', 'Validation'])
plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
Evaluating the model
loss2, accuracy2 = model2.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy2)
21/21 - 1s - loss: 3.4147 - accuracy: 0.5235 - 953ms/epoch - 45ms/step 0.5234615206718445
Observations and insights: The batch normalization layer appears to be making the validation accuracy highly unstable across epochs. For this particular model, batch normalization may not be the appropriate regularization technique -- a dropout layer may be more effectual.
Generate the classification report and confusion matrix
y_predictions = model2.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
conf_matrix_model2 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model0, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
21/21 - 0s - 168ms/epoch - 8ms/step
Think About It :
reset_session()
Use image data generator
# Create data generator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator( # for use in model, image data generator for augmentation
rotation_range=180,
shear_range=25,
zoom_range=[0.8, 1.2],
brightness_range=[0.5, 1.5],
fill_mode='nearest'
)
Think About It :
Visualizing Augmented images
train_labels_ser = pd.Series.map(pd.Series(y_train), {0: 'Parasitized', 1: 'Uninfected'}) # for labeling in plot
datagen_test = ImageDataGenerator(
rotation_range=180,
fill_mode='nearest')
img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))
for i in range(6):
plt.subplot(2, 3, i+1)
batch = img.next()
aug_img = batch[0]
plt.imshow(aug_img)
plt.xlabel(train_labels_ser[i])
plt.show()
datagen_test = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest',
)
img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))
for i in range(6):
plt.subplot(2, 3, i+1)
batch = img.next()
aug_img = batch[0]
plt.imshow(aug_img)
plt.xlabel(train_labels_ser[i])
plt.show()
datagen_test = ImageDataGenerator(
width_shift_range=[-16, 16],
fill_mode='nearest',
)
img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))
for i in range(6):
plt.subplot(2, 3, i+1)
batch = img.next()
aug_img = batch[0]
plt.imshow(aug_img)
plt.xlabel(train_labels_ser[i])
plt.show()
datagen_test = ImageDataGenerator(
shear_range=40,
fill_mode='nearest',
)
img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))
for i in range(6):
plt.subplot(2, 3, i+1)
batch = img.next()
aug_img = batch[0]
plt.imshow(aug_img)
plt.xlabel(train_labels_ser[i])
plt.show()
datagen_test = ImageDataGenerator(
zoom_range=[0.5, 1.5],
fill_mode='nearest',
)
img = datagen_test.flow(X_train, batch_size=1)
plt.figure(figsize=(12,8))
for i in range(6):
plt.subplot(2, 3, i+1)
batch = img.next()
aug_img = batch[0].astype('uint8')
plt.imshow(aug_img/255)
plt.xlabel(train_labels_ser[i])
plt.show()
Observations and insights: Some functions may be meaningful to use in data augmentation, such as the shear, rotation, width shift, and zoom functions. However, if these are not parametrized correctly, it may end up with images that are not reflected in reality under any conditions. Correct parametrization will certainly make the model more robust to variations from data in the real world.
Building the Model
model3 = Sequential([
Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
Conv2D(16, 3, padding='same', activation='tanh'),
MaxPooling2D((2, 2)),
Conv2D(32, 3, padding='same'),
LeakyReLU(0.1),
MaxPooling2D((2, 2)),
Conv2D(16, 3, padding='same'),
LeakyReLU(0.1),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.1),
Dense(64, activation='relu'),
Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model3.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model3.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 64, 64, 3) 0
conv2d (Conv2D) (None, 64, 64, 16) 448
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
conv2d_1 (Conv2D) (None, 32, 32, 32) 4640
leaky_re_lu (LeakyReLU) (None, 32, 32, 32) 0
max_pooling2d_1 (MaxPooling (None, 16, 16, 32) 0
2D)
conv2d_2 (Conv2D) (None, 16, 16, 16) 4624
leaky_re_lu_1 (LeakyReLU) (None, 16, 16, 16) 0
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 128) 524416
dropout (Dropout) (None, 128) 0
dense_1 (Dense) (None, 64) 8256
dense_2 (Dense) (None, 2) 130
=================================================================
Total params: 542,514
Trainable params: 542,514
Non-trainable params: 0
_________________________________________________________________
Using Callbacks
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5'
callbacks = [
EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]
Fit and Train the model
epochs = 30 # as the best model, will train this for more epochs than others
history_3 = model3.fit(datagen.flow(
X_train, y_train, batch_size=128),
validation_data=datagen.flow(X_val, y_val, batch_size=128),
epochs=epochs,
callbacks=callbacks,
)
Epoch 1/30 156/156 [==============================] - ETA: 0s - loss: 0.6496 - accuracy: 0.6183 Epoch 1: val_accuracy improved from -inf to 0.73532, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 36s 215ms/step - loss: 0.6496 - accuracy: 0.6183 - val_loss: 0.5622 - val_accuracy: 0.7353 Epoch 2/30 156/156 [==============================] - ETA: 0s - loss: 0.3954 - accuracy: 0.8290 Epoch 2: val_accuracy improved from 0.73532 to 0.91585, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 215ms/step - loss: 0.3954 - accuracy: 0.8290 - val_loss: 0.2117 - val_accuracy: 0.9158 Epoch 3/30 156/156 [==============================] - ETA: 0s - loss: 0.2011 - accuracy: 0.9275 Epoch 3: val_accuracy improved from 0.91585 to 0.94069, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 218ms/step - loss: 0.2011 - accuracy: 0.9275 - val_loss: 0.1660 - val_accuracy: 0.9407 Epoch 4/30 156/156 [==============================] - ETA: 0s - loss: 0.1430 - accuracy: 0.9497 Epoch 4: val_accuracy improved from 0.94069 to 0.96113, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 217ms/step - loss: 0.1430 - accuracy: 0.9497 - val_loss: 0.1265 - val_accuracy: 0.9611 Epoch 5/30 156/156 [==============================] - ETA: 0s - loss: 0.1288 - accuracy: 0.9565 Epoch 5: val_accuracy improved from 0.96113 to 0.96153, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 219ms/step - loss: 0.1288 - accuracy: 0.9565 - val_loss: 0.1051 - val_accuracy: 0.9615 Epoch 6/30 156/156 [==============================] - ETA: 0s - loss: 0.1161 - accuracy: 0.9617 Epoch 6: val_accuracy improved from 0.96153 to 0.96173, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 216ms/step - loss: 0.1161 - accuracy: 0.9617 - val_loss: 0.1206 - val_accuracy: 0.9617 Epoch 7/30 156/156 [==============================] - ETA: 0s - loss: 0.1141 - accuracy: 0.9637 Epoch 7: val_accuracy improved from 0.96173 to 0.96975, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 33s 215ms/step - loss: 0.1141 - accuracy: 0.9637 - val_loss: 0.0999 - val_accuracy: 0.9697 Epoch 8/30 156/156 [==============================] - ETA: 0s - loss: 0.0988 - accuracy: 0.9680 Epoch 8: val_accuracy improved from 0.96975 to 0.97155, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 33s 214ms/step - loss: 0.0988 - accuracy: 0.9680 - val_loss: 0.0864 - val_accuracy: 0.9715 Epoch 9/30 156/156 [==============================] - ETA: 0s - loss: 0.0935 - accuracy: 0.9694 Epoch 9: val_accuracy did not improve from 0.97155 156/156 [==============================] - 33s 211ms/step - loss: 0.0935 - accuracy: 0.9694 - val_loss: 0.0995 - val_accuracy: 0.9687 Epoch 10/30 156/156 [==============================] - ETA: 0s - loss: 0.0892 - accuracy: 0.9720 Epoch 10: val_accuracy improved from 0.97155 to 0.97275, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 215ms/step - loss: 0.0892 - accuracy: 0.9720 - val_loss: 0.0859 - val_accuracy: 0.9728 Epoch 11/30 156/156 [==============================] - ETA: 0s - loss: 0.0893 - accuracy: 0.9720 Epoch 11: val_accuracy improved from 0.97275 to 0.97576, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 34s 215ms/step - loss: 0.0893 - accuracy: 0.9720 - val_loss: 0.0792 - val_accuracy: 0.9758 Epoch 12/30 156/156 [==============================] - ETA: 0s - loss: 0.0856 - accuracy: 0.9721 Epoch 12: val_accuracy did not improve from 0.97576 156/156 [==============================] - 33s 214ms/step - loss: 0.0856 - accuracy: 0.9721 - val_loss: 0.0886 - val_accuracy: 0.9724 Epoch 13/30 156/156 [==============================] - ETA: 0s - loss: 0.0838 - accuracy: 0.9737 Epoch 13: val_accuracy did not improve from 0.97576 156/156 [==============================] - 34s 215ms/step - loss: 0.0838 - accuracy: 0.9737 - val_loss: 0.0797 - val_accuracy: 0.9750 Epoch 14/30 156/156 [==============================] - ETA: 0s - loss: 0.0831 - accuracy: 0.9737 Epoch 14: val_accuracy did not improve from 0.97576 156/156 [==============================] - 33s 214ms/step - loss: 0.0831 - accuracy: 0.9737 - val_loss: 0.0836 - val_accuracy: 0.9715 Epoch 15/30 156/156 [==============================] - ETA: 0s - loss: 0.0805 - accuracy: 0.9740 Epoch 15: val_accuracy did not improve from 0.97576 156/156 [==============================] - 33s 213ms/step - loss: 0.0805 - accuracy: 0.9740 - val_loss: 0.0808 - val_accuracy: 0.9734 Epoch 16/30 156/156 [==============================] - ETA: 0s - loss: 0.0833 - accuracy: 0.9745 Epoch 16: val_accuracy improved from 0.97576 to 0.97676, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5 156/156 [==============================] - 35s 226ms/step - loss: 0.0833 - accuracy: 0.9745 - val_loss: 0.0711 - val_accuracy: 0.9768 Epoch 17/30 156/156 [==============================] - ETA: 0s - loss: 0.0797 - accuracy: 0.9725 Epoch 17: val_accuracy did not improve from 0.97676 156/156 [==============================] - 33s 212ms/step - loss: 0.0797 - accuracy: 0.9725 - val_loss: 0.0802 - val_accuracy: 0.9756 Epoch 18/30 156/156 [==============================] - ETA: 0s - loss: 0.0761 - accuracy: 0.9735 Epoch 18: val_accuracy did not improve from 0.97676 156/156 [==============================] - 32s 208ms/step - loss: 0.0761 - accuracy: 0.9735 - val_loss: 0.0749 - val_accuracy: 0.9766 Epoch 19/30 156/156 [==============================] - ETA: 0s - loss: 0.0789 - accuracy: 0.9752 Epoch 19: val_accuracy did not improve from 0.97676 156/156 [==============================] - 32s 208ms/step - loss: 0.0789 - accuracy: 0.9752 - val_loss: 0.0767 - val_accuracy: 0.9748 Epoch 20/30 156/156 [==============================] - ETA: 0s - loss: 0.0759 - accuracy: 0.9751 Epoch 20: val_accuracy did not improve from 0.97676 156/156 [==============================] - 33s 212ms/step - loss: 0.0759 - accuracy: 0.9751 - val_loss: 0.0876 - val_accuracy: 0.9754 Epoch 21/30 156/156 [==============================] - ETA: 0s - loss: 0.0765 - accuracy: 0.9750 Epoch 21: val_accuracy did not improve from 0.97676 156/156 [==============================] - 33s 209ms/step - loss: 0.0765 - accuracy: 0.9750 - val_loss: 0.0792 - val_accuracy: 0.9738 Epoch 21: early stopping
Evaluating the model
loss3, accuracy3 = model3.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy3)
21/21 - 1s - loss: 0.0482 - accuracy: 0.9850 - 904ms/epoch - 43ms/step 0.9850000143051147
Plot the train and validation accuracy
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])
plt.legend(['Train', 'Validation'])
plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
Plotting the classification report and confusion matrix
y_predictions = model3.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 84ms/epoch - 4ms/step
conf_matrix_model3 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model3, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred=y_predictions))
precision recall f1-score support
0.0 0.98 0.99 0.99 1300
1.0 0.99 0.98 0.98 1300
accuracy 0.98 2600
macro avg 0.99 0.98 0.98 2600
weighted avg 0.99 0.98 0.98 2600
All around, this model with data augmentation is performing excellently. Accuracy is higher than any other model at 98% overall.We can see that for Uninfected instances, it is more prone to false negatives than false positives (higher precision than recall). Likewise, for Parasitized instances, it is more prone to false positives than negative (lower precision than recall). This is actually ideal, because in the case of malaria diagnosis, which is a health issue, it is more beneficial to have a false positive than a false negative in parasitized instances.
Now, let us try to use a pretrained model like VGG16 and check how it performs on our data.
from tensorflow.keras.applications.vgg16 import VGG16
model_vgg = VGG16(include_top=False, weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3))
model_vgg.trainable = False
model_vgg.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 64, 64, 3)] 0
block1_conv1 (Conv2D) (None, 64, 64, 64) 1792
block1_conv2 (Conv2D) (None, 64, 64, 64) 36928
block1_pool (MaxPooling2D) (None, 32, 32, 64) 0
block2_conv1 (Conv2D) (None, 32, 32, 128) 73856
block2_conv2 (Conv2D) (None, 32, 32, 128) 147584
block2_pool (MaxPooling2D) (None, 16, 16, 128) 0
block3_conv1 (Conv2D) (None, 16, 16, 256) 295168
block3_conv2 (Conv2D) (None, 16, 16, 256) 590080
block3_conv3 (Conv2D) (None, 16, 16, 256) 590080
block3_pool (MaxPooling2D) (None, 8, 8, 256) 0
block4_conv1 (Conv2D) (None, 8, 8, 512) 1180160
block4_conv2 (Conv2D) (None, 8, 8, 512) 2359808
block4_conv3 (Conv2D) (None, 8, 8, 512) 2359808
block4_pool (MaxPooling2D) (None, 4, 4, 512) 0
block5_conv1 (Conv2D) (None, 4, 4, 512) 2359808
block5_conv2 (Conv2D) (None, 4, 4, 512) 2359808
block5_conv3 (Conv2D) (None, 4, 4, 512) 2359808
block5_pool (MaxPooling2D) (None, 2, 2, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
# REF: https://towardsdatascience.com/transfer-learning-with-vgg16-and-keras-50ea161580b4
model4 = Sequential([
model_vgg,
Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
Flatten(),
Dense(128, activation='relu'),
Dropout(0.1),
Dense(64, activation='relu'),
Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
Compiling the model
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model4.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model4.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 2, 2, 512) 14714688
rescaling_2 (Rescaling) (None, 2, 2, 512) 0
flatten_2 (Flatten) (None, 2048) 0
dense_6 (Dense) (None, 128) 262272
dropout_2 (Dropout) (None, 128) 0
dense_7 (Dense) (None, 64) 8256
dense_8 (Dense) (None, 2) 130
=================================================================
Total params: 14,985,346
Trainable params: 270,658
Non-trainable params: 14,714,688
_________________________________________________________________
Using callbacks
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5'
callbacks = [
EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]
Fit and Train the model
epochs = 15
history_4 = model4.fit(datagen.flow(
X_train, y_train, batch_size=128),
validation_data=datagen.flow(X_val, y_val, batch_size=128),
epochs=epochs,
callbacks=callbacks,
)
Epoch 1/15 156/156 [==============================] - ETA: 0s - loss: 0.2338 - accuracy: 0.9187 Epoch 1: val_accuracy improved from -inf to 0.94330, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 35s 218ms/step - loss: 0.2338 - accuracy: 0.9187 - val_loss: 0.1608 - val_accuracy: 0.9433 Epoch 2/15 156/156 [==============================] - ETA: 0s - loss: 0.1563 - accuracy: 0.9447 Epoch 2: val_accuracy improved from 0.94330 to 0.94470, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 34s 216ms/step - loss: 0.1563 - accuracy: 0.9447 - val_loss: 0.1538 - val_accuracy: 0.9447 Epoch 3/15 156/156 [==============================] - ETA: 0s - loss: 0.1428 - accuracy: 0.9493 Epoch 3: val_accuracy improved from 0.94470 to 0.94570, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 34s 216ms/step - loss: 0.1428 - accuracy: 0.9493 - val_loss: 0.1460 - val_accuracy: 0.9457 Epoch 4/15 156/156 [==============================] - ETA: 0s - loss: 0.1420 - accuracy: 0.9478 Epoch 4: val_accuracy improved from 0.94570 to 0.95352, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 34s 220ms/step - loss: 0.1420 - accuracy: 0.9478 - val_loss: 0.1296 - val_accuracy: 0.9535 Epoch 5/15 156/156 [==============================] - ETA: 0s - loss: 0.1370 - accuracy: 0.9507 Epoch 5: val_accuracy did not improve from 0.95352 156/156 [==============================] - 33s 210ms/step - loss: 0.1370 - accuracy: 0.9507 - val_loss: 0.1316 - val_accuracy: 0.9487 Epoch 6/15 156/156 [==============================] - ETA: 0s - loss: 0.1317 - accuracy: 0.9511 Epoch 6: val_accuracy improved from 0.95352 to 0.95612, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 34s 218ms/step - loss: 0.1317 - accuracy: 0.9511 - val_loss: 0.1194 - val_accuracy: 0.9561 Epoch 7/15 156/156 [==============================] - ETA: 0s - loss: 0.1260 - accuracy: 0.9552 Epoch 7: val_accuracy improved from 0.95612 to 0.96013, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5 156/156 [==============================] - 35s 221ms/step - loss: 0.1260 - accuracy: 0.9552 - val_loss: 0.1129 - val_accuracy: 0.9601 Epoch 8/15 156/156 [==============================] - ETA: 0s - loss: 0.1241 - accuracy: 0.9537 Epoch 8: val_accuracy did not improve from 0.96013 156/156 [==============================] - 33s 213ms/step - loss: 0.1241 - accuracy: 0.9537 - val_loss: 0.1226 - val_accuracy: 0.9531 Epoch 9/15 156/156 [==============================] - ETA: 0s - loss: 0.1255 - accuracy: 0.9549 Epoch 9: val_accuracy did not improve from 0.96013 156/156 [==============================] - 32s 208ms/step - loss: 0.1255 - accuracy: 0.9549 - val_loss: 0.1137 - val_accuracy: 0.9593 Epoch 10/15 156/156 [==============================] - ETA: 0s - loss: 0.1229 - accuracy: 0.9559 Epoch 10: val_accuracy did not improve from 0.96013 156/156 [==============================] - 33s 209ms/step - loss: 0.1229 - accuracy: 0.9559 - val_loss: 0.1285 - val_accuracy: 0.9491 Epoch 11/15 156/156 [==============================] - ETA: 0s - loss: 0.1264 - accuracy: 0.9531 Epoch 11: val_accuracy did not improve from 0.96013 156/156 [==============================] - 33s 210ms/step - loss: 0.1264 - accuracy: 0.9531 - val_loss: 0.1207 - val_accuracy: 0.9579 Epoch 12/15 156/156 [==============================] - ETA: 0s - loss: 0.1190 - accuracy: 0.9579 Epoch 12: val_accuracy did not improve from 0.96013 156/156 [==============================] - 33s 209ms/step - loss: 0.1190 - accuracy: 0.9579 - val_loss: 0.1242 - val_accuracy: 0.9579 Epoch 12: early stopping
Plot the train and validation accuracy
plt.plot(history_4.history['accuracy'])
plt.plot(history_4.history['val_accuracy'])
plt.legend(['Train', 'Validation'])
plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
Observations and insights:
The model starts out with a very high accuracy, but it cannot reach the peak that the previous model was able to achieve (though it is close). This may be because the model is in a sense "over-regularized" and may need more layers on top of the base VGG model to extract more features and perform computation on them. However, this was not done due to computing resource constraints.
Evaluating the model
loss4, accuracy4 = model4.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy4)
21/21 - 1s - loss: 0.1020 - accuracy: 0.9615 - 1s/epoch - 60ms/step 0.9615384340286255
Plotting the classification report and confusion matrix
y_predictions = model4.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 467ms/epoch - 22ms/step
conf_matrix_model4 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model4, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
print(classification_report(y_test, y_pred=y_predictions))
precision recall f1-score support
0.0 0.97 0.95 0.96 1300
1.0 0.95 0.97 0.96 1300
accuracy 0.96 2600
macro avg 0.96 0.96 0.96 2600
weighted avg 0.96 0.96 0.96 2600
Think about it:
What observations and insights can be drawn from the confusion matrix and classification report?
Choose the model with the best accuracy scores from all the above models and save it as a final model.
# Best model was saved as part of callback method for model 3, with data augmentation.
model_df = pd.DataFrame({'Model 0': [accuracy0, loss0],
'Model 1': [accuracy1, loss1],
'Model 2': [accuracy2, loss2],
'Model 3': [accuracy3, loss3],
'Model 4': [accuracy4, loss4]}, index=['Accuracy', 'Loss'])
model_df
| Model 0 | Model 1 | Model 2 | Model 3 | Model 4 | |
|---|---|---|---|---|---|
| Accuracy | 0.966538 | 0.973846 | 0.523462 | 0.985000 | 0.961538 |
| Loss | 0.096982 | 0.080863 | 3.414724 | 0.048173 | 0.102036 |
Observations and Conclusions drawn from the final model:
We can see clearly that Model 3, with data augmentation and the dropout layer, performs the best, even better than Model 4 (the transfer learning VGG16 model), all while being significantly smaller (~540K parameters vs. ~15M parameters).
Also stated above: All around, this model with data augmentation is performing excellently. Accuracy is higher than any other model at 98% overall. We can see that for Uninfected instances, it is more prone to false negatives than false positives (higher precision than recall). Likewise, for Parasitized instances, it is more prone to false positives than negative (lower precision than recall). This is actually ideal, because in the case of malaria diagnosis, which is a health issue, it is more beneficial to have a false positive than a false negative in parasitized instances.
Improvements that can be done:
Insights
Refined insights:
Comparison of various techniques and their relative performance:
Proposal for the final solution design: